Midterm 1 Retrospective

Friday, September 19, 2014

Overall Results

Performance on the test was generally strong. The mean was 93.3 and the standard deviation 4.7.

Here is the distribution of results. plot of chunk unnamed-chunk-3

Problem Items

Here is a display of the number of wrong responses, by question number. As you can see, items 22,23, and 24 were the most difficult. plot of chunk unnamed-chunk-4

Item 22

Consider two columns of numbers, \(X\) and \(Y\). The sum of cross-products of deviation scores of the numbers is equal to

\[ \sum_{i=1}^{N}\left( X_{i}-\overline{X}_{\bullet }\right) \left( Y_{i}-% \overline{Y}_{\bullet }\right) =SCP \]

The correct answer was "D" because \(SCP\) is always equal to all of the following quantities:

\(\sum_{i=1}^{N}\left( X_{i}-\overline{X}_{\bullet }\right) \left( Y_{i}\right) \)

\(\sum_{i=1}^{N}\left( X_{i}\right) \left( Y_{i}-\overline{Y}_{\bullet }\right) \)

\(\sum_{i=1}^{N}X_{i}Y_{i}-\left( \sum_{i=1}^{N}X_{i}\sum_{i=1}^{N}Y_{i}\right) /N\)

Item 22 (continued)

It is easy to demonstrate numerically with R that the 4 formulas always seem to yield the same value for any data set.

While this isn't a proof, you can run the routine on the next page dozens of times with different sample sizes and you'll keep getting all 4 numbers the same. This gives you a very strong hint about the answer!

Item 22 (continued)

Here is a function that computes all 4 quantities on sets of random numbers. Run it as many times as you wish.

set.seed(12345)
n <- 10
compareSCP <- function(n){
  X <- rnorm(n); Y <- rnorm(n);  Xbar <- mean(X);Ybar <- mean(Y)
  SCP <- sum((X-Xbar)*(Y-Ybar))
  QuantityA <- sum(Y*(X-Xbar))
  QuantityB <- sum(X*(Y-Ybar))
  QuantityC <- sum(X*Y) - sum(X)*sum(Y)/n
  print(c(SCP,QuantityA,QuantityB,QuantityC))
}
compareSCP(10)

## [1] -1.675 -1.675 -1.675 -1.675

Item 22 (continued)

You can also prove it analytically. The third formula is the computational formula for SCP given in the class notes for covariance.

Moreover, if either of the first two formulas is correct, the other must be correct, because which column of numbers is designated to be \(X\) and which is designated to be \(Y\) is arbitrary.

We'll show with summation algebra that the second formula is equal to \(SCP\).

Item 22 (continued)

\[\sum_{i=1}^{N}(X_i - \overline{X}_\bullet)(Y_i - \overline{Y}_\bullet) = \sum_{i=1}^{N}(X_i(Y_i - \overline{Y}_\bullet) - \overline{X}_\bullet(Y_i - \overline{Y}_\bullet))\]

By the distributive rule, the right side is equal to

\[\sum_{i=1}^{N}X_i(Y_i - \overline{Y}_\bullet) - \sum_{i=1}^{N}\overline{X}_\bullet(Y_i - \overline{Y}_\bullet)\]

By the second constant rule, the far right term simplifies, yielding

\[\sum_{i=1}^{N}X_i(Y_i - \overline{Y}_\bullet) - \overline{X}_\bullet\sum_{i=1}^{N}(Y_i - \overline{Y}_\bullet)\]

Item 22 (continued)

Now we see the the far right term involves the sum of \(Y\) deviations, which is always equal to zero. So the entire right term drops out, leaving us with the left term

\[\sum_{i=1}^{N}X_i(Y_i - \overline{Y}_\bullet) + 0\]

Item 23

The answer is "A" because the sample variance cannot be expressed as a linear combination of the scores in \(X\). All the other expressions are linear combinations of the scores in \(X\), as we show below:

Sample mean (all weights \(1/N\)) \[ \overline{X}_\bullet = \sum_{i=1}^{N}\frac{1}{N}X_i \] Sum of \(X_i\) (all weights \(1\)) \[\sum_{i=1}^N X_i = \sum_{i=1}^N (1) X_i\]

Item 23 (Continued)

Twice the sum of the \(X_i\) \[ 2\sum_{i=1}^N X_i = \sum_{i=1}^N (2) X_i\]

Item 24

To process this expression, you must examine it very closely.

\[\sum_{j=1}^4 \sum_{i=1}^j X_{ij}\]

Note that we begin by setting \(j=1\), putting us inside the first column. Then we run the row subscript \(i\) from 1 to the current value of \(j\), which is 1. This means that, so far, we have selected \(X_{11}\).

Next, we set \(j = 2\), putting us in the second column, and we run the row subscript from 1 to 2. At this point we've added \(X_{11} + (X_{12} + X_{22})\)

Next we set \(j=3\) and run \(i\) from 1 to 3, etc.

Item 24 (Continued)

This means that we have added all the upper triangular elements of \(X\), i.e., those elements for which \(j \ge i\). So the result is \[ (X_{11}) + (X_{12} + X_{22}) + (X_{13} + X_{23} + X_{33}) \\ +(X_{14} + X_{24} + X_{34} + X_{44})\]

This is equal to

\[ (3) + (3 + 9) + (8 +8 + 2) + (5 + 9 + 4 + 4) = 55 \]